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INTRODUCTION 


In the preceding paper, Dr. Swain described some of the research 
activities in Data Analysis techniques carried out at LARS/Purdue this 
past year. This paper continues by summarizing some results obtained 
from additional data processing research tasks now under study there. 

Time and space do not permit details. Full descriptions of these project 
are reported in LARS reports and in papers in the open literature. 

These studies fall into three categories: (l) an examination of 

the suitability of several sensor types with regard to producing data 
required for multispectral machine analysis; (2) various types of data 
preprocessing necessary to prepare such data for analysis in this fashion 
and (3) an experiment in how to make this type of technology available 
efficiently and inexpensively. 


COMPARISON OF SENSOR TYPES 


There are a large number of different types of sensors capable of 
producing data as input to machine analysis processors. These sensor 
types tend to fall into three broad categories: scanners, photography 

and television. The line scanner tends to be preferred for this type 
of analysis procedure because it covers greater portions of the spectrum 
and has greater dynamic range and radiometric precision. However, 
scanners tend to be expensive, complex to operate and relatively unavail- 
able at this time.' Photography, on the other hand, is relatively less 
expensive and is widely available; it is a very well-developed technology 


*In this paper, results from a number of studies are summarized; 
researchers are identified in the Acknowledgment section. 
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Television has still different characteristics in this regard, having 
some of the advantages and disadvantages of both. In order to compare 
these sensors as sources of data for this type of machine processing 
and analysis, a scene was selected in which data had been gathered by 
both a scanner and photographic cameras simultaneously. Classifications 
with identical classes and training areas were carried out on both 
types of data. From this scene four different data types were to be 
compared: 

• scanner data; 

• black and white multispectral photography; 

• color infrared photography; and 

■ vidicon-scanned color infrared photography. 

The scanner data was used directly in this test. The black and 
white multispectral photography was first scanned, digitized and then 
the images from the several parts of the spectrum were registered with 
respect to one another. In the case of the color IR photography, color 
separations were first obtained and these were then scanned, digitized 
and registered. 

Unfortunately, television data was not collected simultaneously 
with the other types of data since an airborne television sensor system 
was not available to us. However, in order to obtain some idea of how 
television might have performed, the color photography was subjected to 
a vidicon scanning system* after which it was digitized and registered. 

The results of this study are shown in Figure 1 in bar graph form. 

The results for individual classes are shown on the left with the over- 
all average results shown on the right. In carrying out this study, it 
was decided to assess the performance based upon the accuracy achieved 
by the samples used to train the classifier. This is compared to 
using so-called test samples or samples other than those used to train 
the classifier. It was felt in so doing that this would minimize the 
effect of scene variability, one of the other major experimental variables. 

In the case of the scanner data, the best four of 15 available 
spectral bands were selected using the divergence processor. These bands 
turned out to be 0. 4^-0. b6, 0.58-0.62, 1.0-1. 4 and 1.5-1 -8 micrometers. 

The data was collected with the Michigan scanner system. Three bands 
of black and white photography were available in a 70mm format. The 
film types and Wratten filters used were: Green, 2^+02, 58; Red, 21+02, 

25A; and IR , 2h2h , 89B. The color photography portion of the experiment 
was carried out using type 2hh3 color infrared film in a 9" by 9" format 
with a Wratten 15 filter. 


*The vidicon scanning of the photography was accomplished by the IBM 
Houston Science Center at no cost to Purdue . 
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It- is seen that the classification accuracy as a whole was very 
high. Thus, a two or three percent difference in overall accuracy is 
probably significant. 

The results do tend to verify what might be expected from a detailed 
knowledge of the sensor type and processing algorithms, namely, that the 
scanner produced the highest performance. This was followed by the black 
and white multispectral photography; performance in this case was only 
slightly greater than color photography due no doubt to the possibility 
of achieving slightly greater radiometric precision with a single photo- 
graphic emulsion as compared to a multi pie -layer one. The television- 
scanned data gave the poorest performance, and while it must be kept in 
mind that this data contained the variability factors of both the 
photography and the television sensor. It nevertheless is to be expected 
that performance of a television sensor for this type of analysis would 
indeed be somewhat inferior to the others . No corrections were applied 
to the data with regard to sun angle effect, vignetting or other types 
of distorting factors. 


DATA PREPROCESSING STUDIES 


There are a large number of parameters of the sensor and data 
processing systems which are, at least initially, under the control of 
the system designer. Such factors as the spectral and spatial resolution, 
signal-to-noise ratio, the degree to which the signals are calibrated 
against available standards , and many others have a direct bearing upon the 
achievable classification accuracy. In order to determine the sensitivity 
and overall effect of the system with regard to these various parameters , 
a number of studies are underway at LARS/Purdue. An additional objective 
of these studies is the determining of suitable techniques by which data 
preprocessing may be carried out to modify and optimize the data with 
respect to these parameters. This is especially desirable since no one 
system design will be universally optimal for all data analysis purposes. 
Figure 2 shows the overall organization of these studies. They are 
divided into four broad areas, each of which has several sub-parts . For 
example, consider the programs in signal-to-noise ratio improvement 
indicated on the left. By using adjacent scan lines of data it is 
possible to improve the signal-to-noise ratio but, to some extent, at the 
expense of spatial resolution. This technique is used to remove or 
minimize the effect of such random noise as is generated in a scanner 
detector among other places. 

There are many types of systematic noise introduced by sensors. 

Effects such as vignetting in photography or television where the data 
is collected in frames or Moire patterns in line scanners are examples 
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of problems which can be minimized or removed through data processing, 
but generally at the expense of other parameters in the system. The 
objective here is to develop suitable techniques to apply theory to 
practice and to quantify the result of doing so. In the remainder of 
this section are presented results obtained during the past year by 
several studies indicated in Figure 2. 


SCANNER DATA CALIBRATION STUDY 


Consider first a study of methods for the radiometric calibration 
of scanner data. The results from this study are shown in Figure 3- 
The data used for this study is from the Michigan scanner system and the 
1971 Corn Blight Watch Experiment. With this scanner system the scanner 
sensors are optically exposed to three different calibration sources 
for each revolution of the scanning mirror. These are a black level 
(Cq), a level of fixed illumination (Cp) and a sensor exposed to the 
solar insolation on the top of the aircraft (C2) • The Cq or black 
level calibration is intended to be used to remove any DC drift which 
may occur during the course of a data gathering mission by establishing 
a reference level at the output for zero (optical) energy in. It is 
possible to use either Cp or C2 to correct for any changes in system 
gain and/or any changes in illumination occurring at the top of the 
aircraft. In actuality, the computer software system used has been 
arranged in such a way that any two of these three signals can be used 
to establish calibration of the data in a linear fashion. 

When considering data calibration however, one must recognize that 
calibration levels can only be determined with a signal-to-noise ratio 
which is finite (that is, less than infinite) just as is the case for 
the data from the scene itself. Thus, one will be using calibration 
information of a given signal-to-noise ratio to correct scene data of a 
given signal-to-noise ratio and depending on the need of the system for 
calibration, the effect may either improve or degrade the overall perfor- 
mance of the system. The question posed in this study then is: In a 

given situation does calibration help, and if so, which type helps the 
most? 


Three different data sets were used in this particular test. These 
data sets are from segments 206, 208 and 215 of the 1971 Corn Blight 
Watch Experiment.* In each case training samples from a given segment 


^Segments 201 through 230 of the 1971 Corn Blight Watch Experiment were 
distributed in order from north to south along the western third of the 
State of Indiana. These three segments are therefore from the northern 
half of the state and are separated by a maximum of about 100 miles . 
Each segment is approximately 1 by 8 miles. 
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were selected for a corn vs. noncorn classification. The classification 
was carried out and then samples from fields other than those used for 
training were used to test the accuracy to assess the performance. 

Shown in Figure 3 are the results of the test with the individual 
segment accuracies shown on the left and the overall average shown on the 
right. The overall average does indeed indicate that calibration helps, 
although note that in segment 215 the "no calibration" control classi- 
fication provided a higher performance than any of the types of 
calibration. Again, overall there was indicated a slight preference for 
using the Cq - calibration. 

Based on the design of the sensor system, the signal -to-noise ratio 
of the Cp calibration signal is, in general, poorer than that from the 
sun sensor C 2 - That is to say, given a higher quality signal from the 
calibration lamp, this slight preference in this case of Cq - Cp cali- 
bration might become an even more pronounced preference. It is important 
to add however, that in the case of aircraft data, as one collects data 
from larger and larger areas, variations in solar illumination become 
more important, and it may become more desirable to use Cq - C2 
calibration in order to achieve highest accuracy. This proved to be the 
case in the next experiment to be described. 


EXTRAPOLATION OF TRAINING SAMPLES 


Over how large a geographical area is a given training set valid? 

This is a very important question. Only a relatively small proportion 
of the total cost of processing the data at the present time is 
attributable to the actual analysis calculation itself. This is true 
whether the analysis is done by analog or digital mode. The expensive 
portion of the processing remains the training phase of the analysis. 

Thus, it is important to develop techniques which reduce the complexity 
and therefore the cost of the training phase and it is also desirable 
to develop techniques by which a single training of a classifier can be 
utilized over larger and larger geographical areas. It was an examination 
of this latter question which was treated in the study to be described 
now. Many of the details of the study are apparent from the results 
displayed in Figure 4. The ordinate displays test data accuracy. The 
abscissa indicates the segment of the 1971 Corn Blight Watch Experiment 
data from which training was derived. The legend of the graph indicates 
the segments classified. The following table indicates the distance 
separating each segment. 
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Number of Airmiles Separating the 
Center of Four Segments of the 
1971 Corn Blight Watch Experiment 


Segment 


Number 

208 

215 

230 

206 

28 

90 

235 

208 


68 

193 

215 



126 


As expected, it appears to be generally true that the farther one 
gets from the training sample area the poorer the accuracy. However, 
there are several factors to be kept in mind in this particular test 
in considering how rapidly the accuracy deteriorates with distance. 

First of all, the segments are distributed in a north-south direction. 

This is the direction of maximum change with regard to seasonal 
variation. One would expect the growing season to be approximately 
two weeks further advanced at the southern most segment on a given day 
than at the northern most. Second, while all data used is from the 
same mission period of the corn blight watch, it did not prove possible 
to gather all the data on the same day. Indeed, the data gathering 
was extended over a 13-day period and, in addition, the data from the 
north, or least advanced portion of the growing season, was gathered 
first, with that in the south or most advanced being gathered last. 

This tended to enhance seasonal variations with regard to crop maturity. 
All data was gathered between 10:30 and 11:45 a.m. Thus there were only 
relatively small changes in sun angle. It did prove desirable in this 
case to use Cq - C 2 calibration. That is, that calibration involving 
removing DC drift with the black level calibration information and 
adjusting the overall system gain based upon the indicated solar 
illumination as determined at the top of the airplane. 

These results together with earlier results shown, tend to be 
encouraging with regard to the extrapolation of training sets particularly 
in view of the improvement expected when satellite-gathered data becomes 
available. Perhaps one of the greatest advantages of the satellite, one 
which is not achievable with an aircraft system, is that data over very 
large areas can be gathered in a very short time, thus holding as nearly 
constant as possible many experimental variables of the system such as 
the sun angle, time of the growing season, etc. 
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SUN ANGLE EFFECT CORRECTION 


Attention has often been drawn to the fact that the reflectance of 
earth surface materials is very dependent upon the angle of illumination 
relative to the angle of view. This fact has a pronounced effect upon 
all types of imagery. The effect on scanner imagery is illustrated in 
Figure 5 . The left image in this figure shows uncorrected data which 
was gathered by scanner at 10:00 a.m. This data was gathered from an 
aircraft having a northern heading such that at this hour of the morning 
the sun was to the right and somewhat to the rear (due to the season and 
latitude of the flightline) of the aircraft. The fact that the image 
appears washed out on the left is not an artifact of image reproduction, 
but sun angle effect which is present in the data. 

Briefly, since the line of view is nearly parallel to the illuminating 
rays of the sun on the left portion of the image, the amount of energy 
reflected from any given (rough surface) material tends to be greater. 

This effect is even more apparent by viewing the average response 
from a large number of scan lines. Figure 6 shows such a presentation in 
graphical form. Plotted here is the average response for a large number 
of columns of data in the imagery plotted versus the column number. For 
this data set, gathered shortly after 9:00 a.m., the response is clearly 
greater on the left, or west, portion of the field of view than it is on 
the right, or east. Figure 7 and 8 show presentations of the same type 
for data gathered near local noon and late in the afternoon, respectively. 
Variations of this effect with time of day is readily apparent. 

This effect has been known for some time and it is relatively easy 
to improve the appearance of the imagery by appropriate processing. The 
most successful technique found to date has been to use a characteristic 
curve such as Figure 6 to derive appropriate multiplicative correction 
factors for each column in the data. The results of having applied 
this technique to the data for the leftmost image of Figure 5 is shown 
in the center of Figure 5 • It is obvious that the appearance of the 
image is greatly improved. However, a careful inspection of the image 
will reveal that a vertical line structure has been introduced into it 
due to the fact that the characteristic curve was not sufficiently smooth. 
If, prior to applying the correction, this characteristic curve is 
smoothed appropriately, the result achieved will be as shown in the right- 
hand image of Figure 5 ■ 

Perhaps more important than the appearance of the image, however, is 
the quality of the data itself. A more quantitative test of the relative 
effectiveness of this type of correction can be obtained by carrying out 
a classification of the data set into appropriate classes for the case 
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of both corrected and uncorrected data. Two examples of the result of 
doing this are shown in Figure 9- In this case, the results for a corn 
versus noncorn classification carried out on two different data sets is 
shown, and in each case the classification of the original versus sun- 
angle-corrected data is compared. Note that the correction of the data 
did result in a significant improvement in accuracy for the classification 
indicated hy the two pair of bars on the left. However, the other 
classification shows a degradation of performance due to data correction. 
Note that the degradation occurred in the data set collected earlier in 
the morning, when the sun angle effect would be more severe. 

The purpose of this is to illustrate the point that though satisfactory 
improvement in image appearance is possible, the results with regard to 
improving data quality are quite mixed. It must be kept in mind that the 
degree to which the illustrated sun angle effect takes place depends not 
only upon the angle relationship between the sun, the scene and the 
observer but also the contents of the scene itself. Individual areas 
within the scene will display this effect to a greater extent depending 
on their contents. By the procedure described, although a globally 
appropriate correction can be made, there is no information available 
by which to make the correction also locally correct. Since classification 
is made on a local basis it is local correction that is required. Best 
overall results can be obtained by defining pairs of classes - one set for 
the left side of the scene, the other for the right side - but at the 
expense of training and processing complexity. It is felt, therefore, 
that the problem is in an unsatisfactory state and a new approach is 
really needed, one no doubt less empirically based, but based on an 
appropriate model of the total situation. 


DATA COMPRESSION TECHNIQUES 


We turn now to the question of data compression. One of the most 
obvious characteristics of the remote sensing field is the large quantity 
of data. This data quantity tends to strain system resources especially 
with regard to data transmission and data storage and retrieval. If 
means for compressing the data can be found which do not significantly alter 
the data quality, it would be most valuable. This led us to begin a 
data compression study and in particular to examine a class of linear 
transformations for this purpose. Among other advantages, this approach 
would tend to minimize the amount of additional processing which would 
be involved. 

There exists in the literature a particular signal representation 
scheme known as the Karhunen-Loeve Orthogonal expansion. Theoretical 
results available with regard to this expansion suggested a number of 
advantages and I shall briefly describe the technique. In this 
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application it amounts simply to a principal components transformation 
and this is illustrated in Figure 10. Suppose for example, we have some 
multispectral data in two spectral bands. Since multispectral data is 
typically correlated from channel to channel it may distribute itself 
as shown by the oval shaped distribution in this figure. A principal 
components transformation amounts to defining a new set of axes by 
taking a linear combination of the original axes. The computation 
involved is shown by the two equations in the lower part of the figure. 

The a coefficients in these equations determine the orientation of the 
new axes relative to the original ones. 

A principal components transformation is that particular transformation 
by which the first new axis, y, in the figure, is chosen so as to be 
oriented along the direction of maximum range or spread of the data as 
shown in the figure. The second component is chosen perpendicular to 
the first but in the direction of the next most principal distribution 
of the data. In higher dimensional cases, succeeding axes continue to 
be chosen orthogonal to the prior ones but in the direction of maximum 
remaining range of distribution. 

The usefulness of this transformation for the data compression 
problem comes about because multispectral data, typically being highly 
correlated between spectral bands, tends to fall in a relatively long, 
narrow distribution in n-dimensional space. Thus, it is possible to 
concentrate most of the dynamic variability of the data in a very few 
number of principal components. This is dramatically illustrated in 
Figure 11, which is a plot of one measure of the dynamic range of the 
data after transformation as a function of the principal component number. 
This data was from 12 spectral bands and it can be seen that after 
transformation only about 3 coordinates have any appreciable dynamic dis- 
tribution of data. The concept for data compression purposes is to 
transform this 12-spectral band data into 12 new components and then 
discard the, in this case 9 , which have essentially no range and therefore 
no information in them. In this case, a 12 to 3 or It to 1 compression 
ratio would be achieved while incurring only the small error indicated by 
the sum of the mean square values of the 9 discarded components as compared 
to the 3 retained ones . 

A further compression can also be achieved by a process called "bit 
allocation." For example, suppose the original 12 band data had been 
represented to an 8 bit precision in each of the 12 bands; that is, in 
each band any one of 256 possible gray values is allowed for. This would 
be 8 bits times 12 bands or 96 bits per multispectral sample. 

Again referring to Figure 11, certainly the dynamic range of the 
first principal component after transformation would be much larger than 
any of the original spectral bands had been. Therefore, in order to 
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achieve the same precision of data representation more hits would need to 
he assigned to this component. However, succeedingly less could he 
assigned to successive components. It may turn out, for example, that 
for the 3 most principal components, the dynamic range required in terms 
of the number of hits might he 9, 6 and 4 allowing for 512, 64 and l6 
gray values respectively for the three most principal components. This 
total new bit allocation of 9 plus 6 plus 4 equals 19 hits and compares 
with the 8 plus 8 plus 8 equals 24 which might have been used had not 
hit allocation been considered. This represents not only an additional 
compression of data but also an improvement in the precision with which 
the data would be presented since the original 8 bits in the first 
principal component would not have been adequate to handle properly the 
larger dynamic range . 

Figure 12 shows a block diagram of a test data compression system 
which has been assembled in order to test this general approach on earth 
resources multispectral data. The first two blocks indicate the two 
steps just described namely, data transformation followed by bit 
allocation. If it is then desired to recover the original data, the 
next step would he an inverse transformation followed by a hit allocation, 
thus transforming the compressed data hack to the original coordinate 
system and the original dynamic range. 

Two additional features not previously described have been incorporated 
into this system. First of all, though the above description of the 
concept involves compression based only on spectral redundancy, it is 
possible to use this approach to take advantage of both spectral redundancy 
and spatial redundancy in the imagery data. It is apparent from the above 
description that basically the only requirement at the input is for the 
data to be in the form of a vector representation. The components of 
the vector may be spectral components as described above, but they may 
also have been derived by using groups of vectors in spatial proximity to 
one another. For example, if the data is composed of samples from a 12- 
band multispectral scanner, inputs to this data compression system could 
be assembled by taking pairs of adjacent points, thus creating 24- 
dimensional input vectors. Indeed, the system of Figure 12 is prepared 
to handle data from an arbitrary number of lines , an arbitrary number 
of columns and an arbitrary number of channels of adjacent sample points. 

In this way, in addition to spectral redundancy, spatial redundancy can 
also be used to achieve higher compression ratios. 

Another feature of the system is that other transformations besides 
the Karhunen-Loeve (principal components) transformation have been 
implemented with the system. The principal components transformation is 
a type referred to as a data-dependent transformation in that the precise 
coefficients for the transformation are computed each time based upon 
calculations involving the total data set to be transformed. However, 
any transformation could be used and at least two other transformations 
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have teen examined besides the one just described. These are the standard 
Fourier (harmonic analysis) transformation and one called the Hadamard 
transform. The Fourier transform was selected for tests because of its 
familiarity and the robustness the transformation has shown with regard 
to a large class of problems extending across the various fields of 
science. The Hadamard transform, on the other hand, was selected because 
of its extreme convenience since the Hadamard functions and, therefore, 
the coefficients involved, are always either ones or zeros, thus making a 
digital implementation of this transform extremely simple and efficient 
from the computational standpoint. Both of these two transforms are non- 
data dependent; that is, they require no prior computation of scene sta- 
tistics before proceeding with the transformation. In addition to these 
two, an "average" Karhunen-Loeve transform could be used based on scene 
statistics from a so-called average scene. This would be another means 
of eliminating a need for a precalculation of scene statistics at the cost 
of some degradation from optimum performance. 

In addition to data compression for data storage and retrieval purposes , 
a procedure such as the above could have several other advantages . For 
example, in Figure 12 the data transformation output could be used directly 
for feature selection and classification purposes. Assuming 12-dimensional 
original data, one problem Immediately to be faced in preparing to analyze 
the data set is which spectral bands will be best in a given classification. 

A somewhat lengthy computational procedure is available for determining 
the optimum subset of spectral bands desirable; however, the data in 
principal components form can be used directly in that the first n-principal 
components tend always to produce classification accuracies which are at or 
above the accuracy performance obtainable by the same number of optimally 
chosen original spectral bands. . Thus, in addition to accomplishing data 
compression, this scheme shows promise for eliminating the need for the 
optimum feature selection computation. This property of the transformation 
has been known for some time and has been used previously by other researchers 
elsewhere . 

Further, since after data transformation the coordinate system, and 
therefore the components involved, have been oriented to have maximum 
dynamic range, if an image is constructed using the data from the first 
principal component at this point, the image will have greater dynamic range 
and therefore greatest scene contrast of any possible image presentation 
of the data. One is guaranteed of having greater image contrast than any - 
one of the original spectral bands could have. This provides imagery 
useful in determining, for example, boundaries in the' scene and manually 
determining differences between any two materials. Figure 13 shows images 
constructed from data using the first, second, third and twelfth principal 
components. These images were produced by attempting to spread whatever 
dynamic range is present in the data over the full range of the contrast 
available in the photographic film. Notice that the scene contrast of the 
first principal component image is markedly greater than that in the third 
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and slightly greater than that in the second. Notice also that the 
twelfth principal component has essentially no image detail at all. 

How does one determine the effectiveness of a proposed data compression 
scheme? The questions which must be asked are: how much distortion does 

the compression scheme introduce into the data and how much does this 
distortion affect the potential classification accuracy and the image 
quality. Figure lb gives the rate distortion characteristics for this 
compression scheme and therefore shows the relationship of the degree of 
compression obtained to the amount of distortion introduced. The distor- 
tion is assessed by determining the mean square difference between the 
original image and the compressed and reconstructed image. In this case, 
the original data was available at eight bits -per-s ample precision. In 
this case, the lower and upper curves are bounds on possible performance. 

The lower curve represents a best performance theoretical limit based upon 
information theory considerations after appropriate assumptions. The 
upper curve on the other hand, is the result achieved by simply truncating 
the number of bits per sample in the data. In between them, then, lies 
the performance characteristics for the three transformations under con- 
sideration. The Karhunen-Loeve clearly provides the best performance. 
However, keep in mind that it is a data dependent transformation. The 
Fourier and Hadamard transforms show the penalty of achieving non-data 
dependency. These are the performance characteristics based upon using a 
rectangular region out of the image of size 8 samples by 8 samples by 2 
spectral bands to construct the vector which undergoes the transformation. 
This appears to be a near optimal choice of combination between spectral 
and spatial redundancy. 

Figure 15 shows the results of carrying out classification tests on 
the compressed data. A test classification was carried out using various 
n umb ers of features. In the case of the original data, the optimal spectral 
bands were first determined in each case. On the other hand, in the case 
of the principal components data, components were added in order as higher 
dimensional classifications were desired, thus indicating the lack of 
necessity for the calculation of optimal spectral bands in this case. i 
Notice that the principal component classifications were always at least 
as high and usually higher in performance compared to the same number of 
components of original data. Notice also that the classification with 
three spectral bands was approximately as high as that achieved with any 
number of bands . 

And finally, Figure l6 shows the result of carrying out a compression 
by a factor of eight and then reconstructing the image from the data. This 
is compared with the image made from the original data. For the small 
amount of distortion present, keep in mind that with such a procedure if 
it had originally been required to have 8 digital tapes to store the data, 
using this procedure only one would be necessary. 
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TEMPORAL INFORMATION AND THE IMAGE OVERLAY PROCESS 


During each of the previous of these annual meetings , we have reported 
on progress towards achieving a suitable capability for the precision 
overlaying of one image from a given geographic area onto that of another 
from the same area. These images may be from different parts of the 
spectrum and/or from data gathered at a different time. This has proved 
to be very challenging, primarily since the precision required is better 
than plus or minus one resolution element; thus, in the overlay process 
one must cause all local distortions of one image to precisely conform 
to those of the other. 

Though this processing step is likely to be a relatively expensive 
one in terms of achieving image overlays of this precision, there are a 
number of important advantages which will accrue as a result. For example, 
temporal information could be made available to the processor by having 
images from different time periods registered with respect to one another. 
Also, the need to correlate ground truth information with data from a new 
mission can be eliminated by overlaying the new image data onto an image 
for which the correlation has already been established. And, in the case 
of airborne scanner data which often contains unexceptably high geometric 
distortions, these distortions can be removed by simply overlaying them 
onto an image which is of high geometric quality. By overlaying the new 
data onto existing maps of areas, procedures could be established 
whereby the maps could be automatically updated after subsequent analysis 
of the data. 

Before describing new procedures developed this past year for 
improving overlay quality possible, an example result in the use of tem- 
poral information will be shown. This result is given in Figure IT- 
Data from Missions ^3M, 44 m, 4 5M and ^6M of the University of Michigan 
airborne scanner system and the 1971 Corn Blight Watch Experiment gathered 
over segment 208 were overlayed upon one another. This figure shows the 
result of carrying out classifications for corn versus non-corn for 
various subsets of features in this total data set. 

First of all, beginning on the left, classifications for each 
individual mission period were carried out using the best four channels 
from each mission period as chosen by the divergence processor. It is 
seen that the performance was relatively high on Mission ^3 but dropped 
considerably by the next mission and then began a slow rise. In addition 
to indicating that some times of the growing season are better for making 
these discriminations than others, part of the difference in performance 
in these four classifications is due to differences in quality of the 
data due to such factors as weather conditions, etc. 
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The fifth set of bar graphs indicates the result of using all of the 
preceding data for a classification; that is, a l6-band classification, each 
four of which came from a different mission period. Notice that the 
capability for classifying corn and the overall accuracy was indeed 
considerably higher in this case. 

One additional question has been posed of this data set so far. It 
was hypothesized that not all 1 6 channels were necessary; that is, that 
the data is not really intrinsically l6-dimensional . The last classification 
was carried out using the best four of the l6 channels. The results show 
an overall performance down from the previous classification and also down 
from the results obtained for Mission 1+3M alone. This would tend to 
suggest that more than U of the l6 bands are indeed significant in this 
case. Much more needs to be determined about temporal information and 
its value . 

With regard to the overlay procedure itself, a new element has been 
added to the technique in order that a larger number of different situations 
can be successfully handled. This work began originally with scanner data 
in mind. More recently, scanned photography and television imagery have also 
been successfully overlayed. 

Figure l8 shows the steps now used in the overlay procedure. First, 
initial checkpoints or points of obvious image congruency are marked 
manually. Based on these checkpoints in the two images to be overlayed, 
a curve-fitting operation is carried out to find the best fit between the 
images from this initial information. More specifically, several coefficients 
in the curve-fitting operation are computed. Next, a fast fourier transform 
two-dimensional correlation is carried out between the two images over a 
uniform grid to obtain precision checkpoints. This correlation uses the 
initial overlay previously determined by the checkpoints in order to 
minimize the region which must be searched for a maximum of correlation. 

As a result of this correlation operation, a final overlay function is 
computed and the two images are then merged to achieve the final overlay 
of the images. 

Figure 19 shows the actual overlay function which would be required 
for some photographic data from the Apollo 9 SO 65 experiment. Briefly, 
frame 3808 from the Lubbock, Texas area was scanned and digitized at a 
rate of approximately 2100 scan lines by 2100 samples per scan line. 

Shown here in the two curves is the variation in registration in terms 
of columns (samples). From the upper curve, it is seen that for channels 
1 and 2, the green and red, respectively, when the left edge and right 
edge of the two images were in proper registration, the center of the 
frame was out of registration by as much as four samples. On the other 
hand, from the lower curve comparing channel 1 with channel 3, the green 
with the infrared respectively, the misregistration in this case was also 
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"by as much as four samples but in the opposite direction. Thus, it would 
be necessary to change the local distortion of the red and the infrared 
channels so that it more properly corresponds with the green channel prior 
to the actual image overlay process. The entire process is, of course, 
carried out digitally. 


Oh THE AVAILABILITY OF TECHNOLOGY 


One final experiment which is now underway will be described. The 
problem is as follows: research into techniques for the machine pro- 

cessing of earth resources data has now been underway for several years 
and significant new technology is now available. How can this tech- 
nology become available to the user community? In considering this 
question, it was decided to analyze how this had been accomplished at 
Purdue between the data processing specialists and user scientists. 

The elements for the availability of this technology are hardware, 
software and the knowledge or training on how to use the system. 

Hardware, at least in the form of general purpose computers, is readily 
available, but expensive. The transfer of software is more of a problem. 
The implementation of a large software system on a new computer is a 
relatively expensive process requiring special data processing expertise. 
It is also relatively expensive to maintain the software once it has been 
implemented . 

Insofar as the third element, training, is concerned, it was possible 
at LARS to give individual attention to training each new staff member in 
the use of the system. However, for the transfer of technology to a 
large body of people, this technique would be too expensive and slow. 

This led to the proposal of a specific experiment in the transfer of 
technology. The concept is illustrated in Figure 20. It became 
apparent that the hardware a user scientist needs to have available is 
a card reader and punch, a typewriter and a line printer, in short, the 
I/O devices. Thus, it is possible to centralize not only the computa- 
tional capability, but also the data storage capability required. Such 
a system would then have the following advantages: (l) full user access 

to both the data and the processing capability at the user's location; 

(2) centralization of the expensive portions of the hardware at consider- 
able cost advantages; ( 3 ) centralization of software maintenance, again 
achieving a cost advantage plus a flexibility in updating; and (4) 
facilitation of training through commonality of data format, terminology 
and simplicity of communication. As a result of this commonality, 
standard training materials tailored to the specific system could be 
developed and' the amount of teacher time per pupil could be greatly 
reduced by relying on training materials. The computer itself can be 
used for training purposes through computer-aided instruction. 
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The status of this experiment is as follows. It was authorized by 
NASA /He adquart er s two years ago. On January 1, 1971, an IBM System/360 
Model 67 time share system was placed on line in a minimal configuration 
in order to appropriately prepare the software system. The 1971 Corn 
Blight Watch Experiment necessitated a delay in. the experiment since 
both the equipment and the personnel involved were required for the 
Watch. However, recently the final hardware was installed and is now 
ready. Some training materials are already ready while others are in 
preparation. The location for the first terminals are now being selected 
by NASA/HQ. It is expected that the experiment will be underway by the 
time ERTS is launched. 
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Comparison of Sensor Types 
August 26, 1970 Data 

H Aircraft Scanner ED Color IR Precision Scan 

S3 B SW Multibond Precision Scan ^ Vidicon Scan of Color IR 



Corn Soybeans Posture Trees Average 

Class 


Figure 1.- Results of comparative classifications of 
multispectral data from four sensor types. 



Figure 2.- Organization of multispectral data preprocessing 
studies "being pursued at LAES/Purdue at this time. 
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Scanner Data Calibration Study 
Corn vs Non-Corn 

July 3I~ Aug 5 , 1971 
K) 30-11 45 AM 
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Figure 3.- Results of classifications from three different 
flightlines comparing four different calibration procedures . 


Extrapolation of Training Samples 
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Figure b.- Results of a test of extrapolating training data 
from one flightline to anoth Segment 20t is more than 
200 miles from segment 230, however, several factors, in 
addition to distance, are significant in this test. 
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Uncorrected Corrected Corrected with 

Smoothed Function 


Figure 5. An Illustration of the use of preprocessing tech- 
niques to improve the appearance of imagery affected by 
variation in reflectance due to view angle and sun angle 
relationships (channel 6, segment 221, mission 42M, July 27, 
1971. 
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Effect of Sun Angle on 
Aircraft Scanner Data 



Column Mean Graph for Channel 6 

Figure 6.- A graph showing the mean spectral response as a function 
of view angle for data collected with an early morning sun angle. 
In this case, sufficient data has been used to nearly average 
out effects due to individual surface cover materials leaving 
only the sur angle effect . 


Effect of Sun Angle on 



Column !i III 221 

West NADIR East 

Column Mean Graph for Channel 6 

Figure 7.- A graph similar to Figure 6, but for data 
gathered with a nearly local noon sun angle . 
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Effect of Sun Angle on 
Aircraft Scanner Data 



Figure 8.- A graph similar to Figure 6, hut for data 
gathered with a late afternoon sun angle. 
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Figure 9-- Results of two corn vs. non-corn classifications 
carried out on the original data and the sun angle correction 
procedure used in Figure 5* Though the imagery appearance is 
obviously , improved, the results from the quantitative 
classification comparison are mixed. 
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Karhunen-Loeve 
(Principal Components) 
Transformation 



Y, = a„x, + a l2 x 2 
Y 2 = Q2i x | + a22*2 


Figure 10.- A sketch of hypothetical bivariant multispectral data 
illustrating the result of a principal component transformation. 
X-^ and X 2 were the original coordinate axes; Y]_ and Y 2 are the 
new ones. The necessary equations are at the bottom. 



Figure 11.- The eigenvalues of an actual 12-band multispectral 
data set. An eigenvalue in this case is an indicator of the 
relative range of the data after principal components 
transformation. Even though there were 12 bands before 
transformation, only three appear to have significant range 
after . 



Test Data Compression System 
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Figure 12.- Organization of a system used to test data compression 
transforms. Fourier and Hadamard transforms have been implemented 
in addition to the Karhunen-Loeve . 








First Component 


Second Component 


Third Component 


Twelfth Component 


Figure 13. Images generated after the data has first undergone 
principal components transform. The first two have higher 
contrasts than any of the original 12 spectral bands. 
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Rate -Distortion Characteristics 

Aircraft Scanner Data 
Log - Log Axes 



Figure lU . - Comparative rate distortion characteristics for the 
three transformations tested. Distortion is measured as the 
mean square difference between the original image and the 
compressed and reconstructed version. 


Classification Tests 

Aircraft Scanner Data 



Figure 15-“ Comparative results between classifications using 
original data and identical ones using principal components 
data. The best subsets of spectral bands were selected using 
a divergence processor. 
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Original 


8 X Compression 


Figure 16. The results of data compression on image quality. 
A compression factor of 8 to 1 was used (Apollo 9 Frame 
No. 3698A). 
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Test Field Classification Accuracy for 
Segment 208 

Spectral vs. Spectral + Temporal 



43 M 44M 45M 46M All Spectral 

Best 4 Channels as chosen by Divergence Processor + Temporal 

(16 Chan) 



Corn 
Other 
Overal I 



43M -Aug 13 
44M -Aug 13 
45M - Sept 14 
46M Sept 24 


Best 4 Spectral 


+ Temporal 


Figure 17 • - A group of classification results illustrating the 
relative value of temporal information. The numbers U3M-U6M 
are mission numbers flown on the dates shown. 
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Image Overlay Procedure 



Figure 18.- The steps used in the current image overlay system. 
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Apollo 9 Digitized Image Correlation 

Frame 3808 Lubbock, Texas 
Line 1864 




Column 


Figure 19." Curves obtained with the image overlay system. They 
show that even when the particular frames were properly aligned 
at the left and right edges, they were as much as four samples 
(out of 2100) out of alignment in the middle. Thus, before 
overlaying a translation of the center portion of one image 
with respect to the other must be carried out in each case. 
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Figure 20.- The layout of equipment for the Multiterminal 
Processing System Experiment. The experiment will test the 
feasibility of centralizing the data hank and computational 
facility while proving input -output and control of that 
computational facility at multiple remote locations. 







